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A  Note  on  Validity  Generalization  Procedures 

We  have  been  Informed  recently  that  the  scientific  status  of  personnel 
research  will  "greatly  advance"  If  the  hypothesis  that  validities  are 
situational ly  specific  is  found  to  be  false  (Schmidt,  Hunter,  &  Pearl man, 
1982,  p.  841;  see  also  Schmidt  &  Hunter,  1977,  1978,  1980).  The  term 
"situational  specificity"  holds  that  true  validities  (l.e.,  population 
correlations  unaffected  by  statistical  artifacts,  indicated  by  £^,  i_  * 
1,...,K  populations)  vary  as  a  function  of  validation  situation  (setting, 
study),  or  ^  >0  (cf.  Hunter,  Schmidt,  &  Jackson,  1982).  In 
contrast  to  situational  specificity,  we  will  use  the  term  "cross-situational 
consistency"  to  refer  to  conditions  in  which  the  £^  are  constant  over  K 
populations  (situations)  and  all  variation  among  observed  validities  ( r. , 
j_  =  1,...,K)  Is  attributable  to  sampling  error  and  other  types  of 
statistical  artifacts  such  as  variations  in  criterion  reliabilities  and 
variances  (cf.  Hunter  &  Hunter,  1984;  Hunter,  Schmidt  S  Jackson,  1982; 
Hunter,  Schmidt,  &  Pearlman,  1981,  1982;  Schmidt,  Hunter,  Pearlman,  &  Shane, 
1979;  see  also  Callender  &  Osburn,  1980,  1981,  1982;  Raju  &  Burke,  1983).  A 
constant  £  across  situations  implies  that  validity  Is  generallzable  or 
"transportable"  (Schmidt  et  al.,  1982,  p.  81),  although  the  term  "validity 
generalization"  refers  to  a  less  demanding  condition  In  which  "most  of  the 
values  of  estimated  true  validities... lie  In  the  positive  range"  (Schmidt  et 
al.,  1982,  pp.  840,  841). 

The  term  "validity  generalization  approach  (analysis,  procedure)"  Is 
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employed  here  to  refer  broadly  to  the  assumptions  and  quantitative  techniques 
used  by  the  proponents  of  this  approach  to  contrast  cross-situational 
consistency  with  situational  specificity.  It  Is  noteworthy  that  proponents 
of  the  validity  generalization  approach  have  based  their  substantive  and 
statistical  developments  on  structural  equations  for  observed  validities, 
the  objective  being  to  Identify  causes  for  variation  among  validity 
coefficients  and  thereby  construct  explanatory  models  for  validity 
distributions  (cf.  Schmidt,  Gast-Rosenb^rg,  &  Hunter,  1980).  Indeed,  the 
attempt  to  construct  useful  explanatory  models  for  validity  distributions  Is 
made  possible  because  structural  equations  provide  explicit,  quantitative 
statements  of  statistical  theory  regarding  the  rules  that  presumably  govern  . 
the  occurrences  of  validities.  The  validity  generalization  approach  also 
proposes  quantitative  methods  for  assessing  the  goodness  of  fit  of  the 
structural  equations— that  Is,  for  confirming  or  dlsconflrmlng  predictions 
evolving  from  the  causal  models  for  validities  and  validity  distributions. 

The  end-products  of  these  tests  have  serious  Implications  for 
Industrial-organizational  psychologists,  an  example  being  that  confirmation 
of  a  cross-situational  consistency  model  implies  that  validity  studies  may 
not  have  to  be  repeated  in  each  situation  in  which  a  test  is  used. 

Industrial-organizational  psychologists  are  playing  for  high  stakes 
here,  and  rigorous  review  Is  needed  of  the  statistical  foundations  for  the 
structural  equations  (l.e.,  causal  models)  for  validities  and  validity 
distributions,  the  methods  for  confirming  or  dlsconflrmlng  predictions 
evolving  from  the  causal  models,  and  the  causal  Inferences  that  derive  from 
results  of  the  confirmatory  (l.e.,  validity  generalization)  analyses.  Our 
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objective  is  to  furnish  at  least  a  partial  review.  To  set  the  stage  for  the 
review,  consider  the  following  quotations,  which  describe  what  the  key 
proponents  of  the  validity  generalization  approach  Intended  to  do  and  their 
perceptions  of  the  results  of  their  efforts. 

In  order  to  establish  such  patterns  of  relationships,  it  Is  first 
necessary  to  demonstrate  that  the  doctrine  of  situational  specificity  Is 
false  or  essentially  false.  If  the  situational  specificity  hypothesis 
Is  rejected,  then  It  follows  that  various  constructs— for  example, 
spatial  ability— have  invariant  population  relationships  with 
specified  kinds  of  performances  and  job  behaviors  (Schmidt  et  al.,  1979, 
p.  267,  italics  added). 

Schmidt  and  Hunter  (1977)  showed  that  Ignoring  sampling  error  leads  to 
disastrous  results  in  the  area  of  personnel  selection.  Because  he 
Ignored  the  effect  of  sampling  error  on  the  variance  of  findings  across 
studies,  Ghlselli  (1966,  1973)  concluded  that  tests  are  only  valid  on  a 
sporadic  basis,  that  validity  varies  from  one  setting  to  another  because 
of  subtle  differences  In  job  requirements  that  have  not  yet  been 
discovered  (Hunter  4  Hunter,  1984,  p.  77). 

In  conclusion,  our  evidence  shows  that  the  validity  of  the  cognitive 
tests  studied  Is  neither  specific  to  situations  nor  specific  to  jobs 
(Schmidt  6  Hunter,  1981,  p.  1133). 

The  evidence  from  these  two  studies  appears  to  be  the  last  nail  required 
for  the  coffin  of  the  situational  specificity  hypothesis  (Schmidt, 
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Hunter,  Pearl man,  &  Hirsh,  1984,  p.  73). 

The  problem  with  these  conclusions  Is  that  they  are  stated  In  a 
categorical  manner  that  Implies  Irrefutable  evidence  In  favor  of  a 
cross-situational  consistency  structural  model  and  against  a  situational 
specificity  structural  model.  Little  attention  Is  given  to  the  possibility 
that  future  tests  based  on  different  assumptions  might  disconflrm 
cross-situational  consistency  or  at  least  furnish  an  alternative  view  that 
explains  the  data  as  well  as  cross-situational  consistency.  To  be  specific, 
empirical  support  for  a  causal  theory  Implies  that  a  theory  Is  a  useful 
guide  to  explanation.  It  does  not  Imply  that  the  theory  Is  true  or  unique 
because  (a)  empirical  analyses  usually  Involve  untested  assumptions  that  may 
be  false,  and  (b)  a  set  of  observed  data  may  be  explained  equally  well  by 
more  than  one  causal  theory  (James,  Mulalk,  &  Brett,  1982).  These  would  seem 
to  be  pertinent  concerns  given  that  the  null  hypothesis  of  situational 
specificity  had  not  been  rejected  In  54X  (80  of  151)  of  the  validity 
distributions  reviewed  by  Schmidt,  Hunter,  and  colleagues  at  the  time  of  the 
Schmidt  et  al.  (1982)  article. 

Our  objective  Is  to  demonstrate  that  alternative  assumptions  and  views 
exist  even  though  validity  generalization  procedures  appear  to  support  a 
causal  Inference  that  validities  are  cross-si tuatlonally  consistent.  The 
first  step  toward  this  objective  Is  to  use  validity  generalization  procedures 
to  show  that  a  cross-situational  consistency  causal  model  has  a  good 
empirical  fit  with  a  contrived  distribution  of  observed  validities.  A  number 
of  simplifying  assumptions  were  made  with  respect  to  the  validity 


Validity  Generalization 


distribution  and  the  analyses  In  order  to  focus  on  matters  of  principle.  The 
simplifying  assumptions  were:  (a)  sampling  error  explains  the  lion's  share 
of  variation  among  observed  validities  (Hunter  &  Hunter,  1984;  Schmidt  et 
al.,  1982)  and  thus  sampling  error  Is  the  only  statistical  artifact 
Introduced  Into  the  distribution;  (b)  the  sample  size  for  each  sample 
(n^)  Is  a  constant  equal  to  70,  which  simplifies  equations  but  retains 
reasonable  and  realistic  sampling  error  (Lent,  Aurbach,  &  Levin,  1971);  (c) 
only  one  sample  was  obtained  from  each  of  K  situations,  which  Is  the 
typical  case  In  practice;  and  (d)  sampling  was  done  randomly  from  a  bivariate 
normal  population  underlying  each  situation. 

The  second  step  toward  the  goal  of  demonstrating  alternative  views  and 
assumptions  Involves  proposing  an  alternative  explanatory  model  to 
cross-situational  consistency  and  then  showing  that  this  alternative  model 
also  has  a  good  empirical  fit  with  the  (same)  contrived  distribution  of 
observed  validities.  The  third  and  final  step  Is  to  show  that  many  of  the 
statistical  assumptions  on  which  validity  generalization  analyses  are  based 
are  false.  The  paper  Is  concluded  with  recommendations  for  future  uses  of 
validity  generalization  procedures  and  research  on  cross-situational 
consistency  versus  situational  specificity. 

An  Overview  of  the  Validity  Generalization  Approach 

The  validity  generalization  approach  Is  a  form  of  a  statistical  "what 
If"  scenario.  One  devises  a  statistical  scenario,  applies  the  scenario  to 
data  on  the  assumption  that  the  scenario  Is  valid,  and  ascertains  the 
statistical  consequences.  The  key  "what  If"  assumption  for  the  validity 
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generalization  procedure  Is:  What  If  "...the  population  correlation  Is 
assumed  to  be  constant  over  studies"  (Hunter,  Schmidt,  &  Jackson,  1982,  p. 

40)?  Let's  proceed  as  If  this  assumption  Is  valid  for  the  30  contrived 
validity  coefficients  (observed  correlations  or  r^ ,  1.  =1,. . . ,  30  studies, 
situations,  populations)  presented  in  Table  1.  The  distribution  of  contrived 
validities  was  based  on  the  premises  that  (a)  the  "true  validity"  for  tests 
for  many  jobs  is  at  least  .50  (Hunter  &  Hunter,  1984;  Pearlman,  Schmidt,  & 

Hunter,  1980);  (b)  sampling  error  Is  the  only  statistical  artifact  In 
operation;  and  (c)  the  sampling  distribution  has  a  slight  negative  skew  (for 
reasons  addressed  later).  In  addition,  the  distribution  was  purposefully 
designed  to  Illustrate  a  condition  In  which  multiple  conclusions  could  be 
drawn  regarding  causes  of  variance  among  the  r,,  the  supposition  being 
that  empirical  confirmation  of  more  than  one  explanatory  model  precludes 
exclusive  reliance  on  a  particular  explanatory  model  (e.g.,  cross-situational 
consistency).  In  this  regard,  the  range  of  correlations  In  the  contrived 
distribution  is  about  the  same  as  the  range  of  simulated  true  validities  used 
by  Osburn,  Callender,  Greener,  and  Ashworth  (1983,  p.  117)  In  their 
"moderate  true  variance"  distribution.  The  original  (and  simplified) 
validity  generalization  (VG)  equations  based  on  Hunter,  Schmidt,  and  Jackson 
(1982),  and  their  ensuing  estimates  for  the  data  in  Table  1,  are  presented  in 
Table  2. 


Insert  Tables  1  and  2  about  here 
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Equation  1  in  Table  2  furnishes  the  mean  observed  validity  (F),  which 

is  an  estimate  of  the  constant  population  correlation  "£*'  if  indeed  a 

constant  correlation  is  a  viable  alternative  to  the  null  hypothesis  of 

o 

situation  specificity  (i.e.,  >0).  The  null  hypothesis  is  tested 

2 

by  comparing  the  variance  among  the  observed  validities  (i.e.,  sr  , 

Equation  2)  to  an  estimate  of  the  variance  among  these  validities  that  would 

A  2  . 

be  expected  from  sampling  error  exclusively  (i.e.,  ,  Equation  3). 

a  ?  2 

The  comparison  typically  takes  the  form  of  the  ratio  dg  /sr  , 


which  is: 


the  proportion  of  observed  variance  (the  denominator)  that  is  accounted 

for  by  statistical  artifacts  (the  numerator).  The  numerator  in  this 

ratio  is  the  variance  in  observed  validities  predicted  from  artifacts 

alone;  the  denominator  Is  the  observed  (computed)  variance  of  these 

validities.  We  have  used  this  ratio  to  draw  conclusions  about  the 

situational  specificity  hypothesis,  that  is,  the  hypothesis  that 

[O'  ]  >0.  The  rule  that  we  have  used  in  our  research  is  that  if 
“£ 

this  ratio  (expressed  as  a  percentage)  is  75%  or  greater,  we  reject  the 
hypothesis  that  [d^  ]  >0.  The  rationale  for  this  decision  rule  is 
that  the  remaining  artifacts  for  which  we  cannot  correct  are  likely  to 
account  for  at  least  25%  of  the  observed  variance  (Schmidt  et  al.,  1982, 
p.  840;  terms  In  brackets  reflect  statistical  designations  used  in  the 
present  discussion). 

The  ratio  d^/s^  reported  in  Table  2  is  .75  (75%),  which 
satisfies  the  VG  decision  rule.  According  to  this  rule,  the  ..conclusion 
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p 

should  be  that  the  situational  specificity  hypothesis  (d^  >  0)  Is 
dlsconflrmed  because  essentially  100%  of  the  variation  among  validities  in 
Table  1  can  be  attributed  to  sampling  error  and  other,  unmeasured  statistical 
artifacts.  Moreover,  according  to  the  Schmidt  et  al.  (1979,  p.  267) 
rationale  quoted  earlier,  it  follows  from  rejection  of  the  situation 
specificity  hypothesis  that  the  predictor  construct  (e.g.,  spatial  ability) 
has  "invariant  population  relationships  with  specified  kinds  of  performances 
and  job  behaviors."  This  is  cross-situational  consistency. 

Affirming  the  consequent.  Does  It  in  fact  follow  that  population 
relationships  are  invariant  If  the  situational  specificity  hypothesis  is 
rejected?  The  answer  Is  no.  Indeed,  the  Schmidt  et  al.  (1979)  statement 
Illustrates  a  form  of  logical  fallacy  known  as  "affirming  the  consequent" 

(cf.  James  et  al.,  1982).  This  logical  fallacy  occurs  when  a  good  fit 
between  predictions  from  a  causal  theory  and  empirical  data  is  used  to  infer 
that  the  theory  actually  and  uniquely  explains  the  data.  The  fallacy  of  such 
an  inference  Is,  as  noted,  that  other  causal  theories  may  explain  the  same 
data  as  well  as  the  theory  of  interest  and  that  assumptions  used  to  conduct 
the  empirical  test  may  be  false.  To  avoid  the  fallacy  of  affirming  the 
consequent,  one  notes  that  (a)  empirical  support  for  a  causal  theory 
Indicates  that  the  theory  furnishes  a  useful  basis  for  explanation  without 
(b)  Inferring  that  the  theory  furnishes  a  unique  basis  for  explanation. 

A  2  p 

In  the  present  case,  the  finding  that  tfg  /sr  =  .75 
Indicates  a  good  empirical  fit  between  the  data  and  a  causal  theory 
(explanatory  model)  of  cross-situational  consistency— according  to  the  VG 
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decision  rule,  that  Is.  Accepting  the  VG  decision  rule  as  valid  for  the 
moment,  the  Inference  should  be  that  cross-situational  consistency  furnishes 
a  useful  explanation  for  the  observed  variance  among  the  r^.  The 
Inference  should  not  be  that  the  population  correlations  are  Invariant 
because  this  Implies  that  cross-situational  consistency  furnishes  the  only, 
or  a  unique,  explanation  for  the  data.  Indeed,  an  exclusive  attribution  to 
cross-situational  consistency  Is  an  Illustration  of  affirming  the  consequent 
because  alternative  views  (explanations)  are  consistent  with  the  data  in 
Table  1  and  because  untested  assumptions  can  be  shown  to  be  false.  The  Issue 
of  alternative  explanations  Is  addressed  below.  This  discussion  Is  followed 
by  consideration  of  false  assumptions. 

Alternative  Explanations 

In  the  interest  of  constrast  to  the  VG  approach,  it  Is  assumed  now 
that  Ghiselll  (1966,  1973)  was  correct  In  concluding  that  validity  is 
sltuatlonally  specific.  Situational  specificity  Is  presumed  to  be  due  in 
part  to  unknown  differences  In  the  measurement  (latent  structure)  models  for 
job  performance  and  In  job  requirements  over  studies  (situations)  (Ghiselli, 
1966,  1973).  It  Is  presumed  further  that  situational  specificity  among 
correlations  between  a  person  variable  predictor  (e.g.,  cognitive  skills)  and 
a  criterion  (e.g.,  job  performance)  Is  also  a  potential  function  of 
moderating  effects  due  to  variation  among  situations  In  variables  such  as 
leadership,  reward  structures  and  processes,  group  cohesiveness,  stress  and 
coping  mechanisms  for  stress,  systems  norms  and  values  (e.g.,  conformity, 
loyalty),  socialization  strategies,  formal  and  Informal  communication  nets. 
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formalization  and  standardization  of  structure,  and  physical  environments 
(e.g.,  privacy),  to  name  a  few  variables.  While  we  presently  lack 
empirically  confirmed,  explanatory  models  for  job  performance  that  integrate 
situational  variables  and  person  variables  (cf.  James  &  Jones,  1976),  there 
is  no  dearth  of  basic  psychological  theories  that  portray  behavior  (which 
includes  job  performance)  as  a  function  of  person  variables,  situational 
variables,  and  various  forms  of  interactions  between  person  variables  and 
situational  variables,  including  nonadditive  person  by  situation  interactions 
(cf.  Bowers,  1973;  Ekehammer,  1974;  Endler  &  Magnusson,  1976;  Lewin,  1938; 
Lichtman  4  Hunt,  1971;  Pervin,  1968).  It  is  a  simple  matter  to  employ  these 
theories  to  develop  models  in  which  the  correlation  between  a  person 
variable  and  job  performance  varies  as  a  function  of  levels  or  scores  on 
situational  variables.  Furthermore,  if  we  postulate  that  no  two  situations 
have  an  identical  pattern  of  scores  on  the  situational  variables 
(moderators),  then  we  may  logically  entertain  the  notion  that  each  situation 
represents  a  different  (sub)population  with  a  different  (sub)population 
validity. 

We  will  therefore  proceed  to  implement  a  situational  specificity  "what 
if"  scenario  based  on  the  assumption  that  a  unique  population  validity 
underlies  each  situation  (study).  We  begin  with  the  psychometric  analogy 
employed  by  Hunter,  Schmidt,  and  Jackson  (1982)  to  establish  a  statistical 
foundation  for  VG  analysis.  The  basic  structural  (causal)  equation  is: 


I,  '  2,  ♦  i, 


(S) 
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In  terms  of  the  psychometric  analogy,  r^  is  the  observed  score 

(correlation)  for  a  subject  (sample)  from  population  (situation)  i_,  £^ 

Is  the  true  score  (population  correlation)  for  situation  i_,  and  e^  is 

the  random  measurement  (sampling)  error  associated  with  r^.  The 

2  ~ 

situational  specificity  hypothesis  is  again  d^  >  0.  Here,  however,  we 
have  as  many  populations  as  we  have  situations  or  studies,  which  denotes  that 
each  of  the  30  observed  correlations  in  Table  1  is  a  single  representation  of 
its  specific  That  is,  only  one  random  sample  (n.  =  70)  has  been 
drawn  from  the  specific  bivariate  normal  distribution  associated  with  each 
situation.  We  may  also  view  each  r^  as  a  single  realization  (sample  of 
one)  from  a  sampling  distribution  comprised  of  an  infinite  number  of 
Independently  estimated  correlations,  where  each  correlation  Is  based  on  a 
sample  of  70  subjects  drawn  randomly  from  a  population  having  £^  as  a 
correlation.  There  are  30  such  sampling  distributions. 

A  point  that  is  typically  not  considered  In  VG  analysis  is  that 
"reasonable  limits"  for  each  £j  can  be  estimated  based  on  the  inequality 
Si  *  2(dei )  <_pi  <jr1  +  2(2^),  where  1s  an  estimate  of  the 
error  of  measurement  for  population  1_  (Gulllksen,  1950,  p.  20).  The 
equation  for  2^  Is  addressed  later  In  this  paper.  To  Illustrate  the  use 
of  reasonable  limits,  2^  for  r^  *  .26  Is  .11  and  reasonable  limits 
for  the  true  (population)  correlation  are  .04  to  .48.  An  estimate  of  d^ 
for  r.j  *  .72  is  .06,  and  the  reasonable  limits  for  £j  are  .60  to  .84. 

—  9  “* 

If  we  were  to  establish  reasonable  limits  for  each  of  the  30  £|  and  then 
view  the  30  ranges  jointly,  we  would  find  that  the  joint  range  of  reasonable 
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limits  of  possible  values  of  the  30  j>^  varies  from  .04  to  .84. 

We  now  have  two  Interpretations  for  the  same  set  of  correlations,  one 
furnished  by  VG  analysis  that  suggests  that  the  are  constant  and  equal 
to  .50,  and  one  furnished  by  a  situational  specificity  hypothesis  that 
suggests  that  the  £^  are  different  and  could  range  between  .04  and  .84. 

The  VG  analysis  reported  In  Table  2  supports  the  former,  cross-situational 
consistency  model.  What  evidence  Is  there  for  the  latter,  situational 
specificity  model?  Well,  we  have  an  analysis  based  on  chi-square  to 

p 

test  the  null  hypothesis  that  ^  £  for  all  i_,  or  ^  *  0. 

This  test,  given  In  Cohen  and  Cohen  (1975,  p.  52;  the  equation  In  Cohen  & 
Cohen,  1983,  p.  55  Is  missing  a  salient  bracket),  furnishes  a  chi-square 
value  of  42.009,  which  Is  significant  at  the  .05  level  using  a  one-tall  test 
of  significance. 

Rejection  of  the  null  hypothesis  of  cross-situational  consistency 
2 

Implies  that  >  0,  or  that  the  validities  In  Table  1  may  be 
sltuatlonally  specific.  We  must  be  careful  not  to  affirm  our  own  consequent, 
however,  and  thus  we  conclude  that  the  results  of  the  present  analysis  Imply 
that  the  distribution  of  observed  validities  In  Table  1  could  have  been 
generated  by  a  set  of  different  £j,  plus  sampling  error.  We  have  no 
proof  that  this  Is  so,  but  we  do  have  a  viable,  empirically  confirmed 
alternative  to  the  assumption  that  the  observed  validities  were  generated  by 

a  constant  £  and  sampling  error.  It  may  be  discomforting  to  realize  that 

2  7 

a  *  0  and  d~  >  0  are  both  viable  alternatives.  Yet,  multiple 
-£  "£ 
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and  conflicting  explanatory  models  are  to  be  expected  In  causal,  or 
confirmatory,  analysis  (James  et  al.,  1982).  When  confronted  with 
conflicting  models,  the  objective  is  to  ascertain  If  one  or  more  of  the 
models  might  be  dlsconfirmed  by  additional  tests,  a  source  of  which  Is 
further  examination  of  untested  assumptions  of  one  or  more  of  the  models. 

Presented  below  is  an  examination  of  the  assumptions  underlying  the  VG 

2  /v  2  2 

decision  rule  that  tf  >  0  should  be  rejected  when  <T,  /s„.  >  .75. 

— p  — e  — r  — 

A  comparative  analysis  of  power.  Hunter,  Schmidt,  and  Jackson  ' 

(1982,  p.  47)  did  not  endorse  their  form  of  a  chi-square  test,  which 
furnishes  a  chi-square  value  of  41.07  for  the  data  In  Table  1,  because  the 
chi-square  test  "has  very  high  statistical  power  and  will  therefore  reject 
the  null  hypothesis  [of  cross-situational  consistency]  given  a  trivial  amount 
of  variation  across  studies."  In  the  interest  of  fairness,  we  believe  that 
it  Is  important  also  to  evaluate  the  power  of  the  VG  ratio 
^  /s r  in  regard  to  rejecting  the  null  hypothesis  of  situational 
specificity.  A  recent  simulation  study  by  Osburn  et  al.  (1983)  suggested 
that  the  decision  rule  to  reject  the  situational  specificity  hypothesis  when 

a  p  p 

0^  /s  4  2  results  In  too  much  power  In  the  sense  that 

situational  specificity  Is  rejected  when  low  to  moderate  variance  exists 

among  the  (true  validities),  given  that  the  n^  are  not  large 

(<  100).  We  wish  to  address  this  point  with  some  logic  and  simple  algebra 

within  the  context  that  sample  sizes  are  not  large  (e.g.,  n^  -  70)  and 

for  the  critical  value  of  the  VG  decision  rule  (l.e.,  >  .75). 
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Consider  the  following  statement  by  Schmidt  et  al.  (1982,  p.  844): 

"We  have  found  that,  except  when  study  sample  sizes  are  very  large,  most  of 
the  variance  In  observed  correlations  that  is  due  to  artifacts  Is  due  to  only 
one  artifact— simple  sampling  error."  To  Illustrate  this  point,  Schmidt  et 
al.  (1982)  reported  that  an  average  of  90S  of  all  of  the  variance  due  to 
measurable  artifacts  was  attributable  to  sampling  error  In  two  studies  that 
employed  the  Schmidt  and  Hunter  "Interactive  equation,"  which  uses  a 
simultaneous  procedure  to  estimate  variance  due  to  measurable  artifacts. 
Measurable  artifacts  Included  between-study  differences  In  sampling  error, 
range  restriction,  criterion  reliability,  and  predictor  reliability.  These 
points  suggest  that  for  a  VG  ratio  equal  to  the  critical  value  of  .75,  we 
would  attribute  67. 5%  [1 .e. ,  .90(. 75)100]  of  the  total  observed  variance 
(i.e.,  sr2)  to  sampling  error  and  7.5%  [i.e.,  . 10( .75)100]  of  the  total 
observed  variance  to  the  remaining  three  measurable  artifacts. 

The  remaining  25%  of  the  observed  variance  Is  regarded  as  being  caused 

by  unmeasured  statistical  artifacts  according  to  VG  logic.  Remember,  a  VG 

2  2 
ratio  *  .75  Implies  <J^~  *  0  because  25%  of  the  variance  In 

can  be  attributed  to  unmeasured  artifacts  (Schmidt  et  al.,  1982,  p.  840). 

Unmeasured  artifacts  Involve  (a)  between-study  differences  In  criterion 

contamination  and  deficiency;  (b)  clerical  errors  In  computation,  typing,  and 

transcription;  and  (c)  "slight  differences  In  the  factor  structure  of  tests 

designed  to  measure  the  same  construct"  (Schmidt  et  al.,  1982,  p.  840). 

We  find  It  Incongruous  that  the  variance  attributed  to  criterion 
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contamination  and  deficiency,  clerical  errors,  and  "slight"  differences  In 
test  factor  structures  exceeds  the  variance  attributed  to  range 
restriction,  criterion  reliability,  and  test  reliability  by  a  factor  greater 
than  3  (l.e.,  .25/. 075  *  3.33  at  the  critical  value  of  the  decision  rule). 

The  adjective  "slight"  In  describing  differences  In  factor  structures  of 
tests  Is  well -taken,  for  If  factor  structures  of  tests  vary  among  studies, 
then  the  VG  analysis  Is  mixing  apples  with  bicycles.  But  how  does  one 
operationalize  "slight"  In  regard  to  a  point-estimate  of  variance 
attributable  to  this  artifact?  Well,  a  reasonable  heuristic  might  be  to 
Interpret  "slight"  to  mean  approximately  10%  of  all  of  the  variance 
attributed  to  unmeasured  artifacts,  or  .10(.25)  100  *  2.5%.  This  suggests 
also  that  2.5%  of  the  total  variance  among  the  r^  (l.e.,  s^.  )  could 
be  attributed  to  " slight "  differences  In  factor  structures  of  tests,  which  If 
anything,  seems  generous  given  that  only  7.5%  of  this  variance  Is  attributed 
to  between-study  differences  In  criterion  reliability  (CR),  predictor 
reliabllty  (PR),  and  range  restriction  (RR). 

2 

The  rationale  above  means  that  approximately  22.5%  of  sr 
should  be  viewed  as  being  caused  by  the  unmeasured  artifacts  of  criterion 
contamination  and  deficiency  (CCD)  and  clerical  errors  (CE).  But  is  all  of 
the  variance  among  the  r^  to  be  attributed  to  CCD  and  CE  unmeasured?  We 
think  not  because  many  of  the  causes  of  CCD  and  CE  that  affect  between-study 
difference  In  validities  are  also  likely  to  Influence  between-study 

differences  In  reliabilities.  The  logical  and  statistical  progression  Is 

2 

that  at  least  some  of  the  causes  of  sm  attributed  to  CCE  and  CE  are  In 

— r 

truth  already  measured  and  Included  In  variance  among  the  r^ 
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attributed  to  CR  and  PR. 

To  Illustrate,  criterion  contamination  Involves  (a)  systematic  biases 
evolving  from  factors  such  as  opportunity  bias,  rater  bias,  group 
characteristic  bias,  and  knowledge  of  predictor  bias;  and  (b)  random  error 
(Blum  A  Naylor,  1968).  Between-study  differences  In  random  errors  are 
obviously  Included  in  variation  among  the  r^  attributed  to 
between-study  differences  In  CR.  Variation  In  systematic  biases  over  studies 
also  Influences  variations  In  CR  (cf.  Gulon,  1965;  James,  Demaree,  A  Wolf, 
1984)  and  therefore  variation  among  the  r^«  Much  the  same  can  be  said 
for  clerical  errors.  Be  these  errors  systematic  and/or  nonsystematlc  and 
Involved  In  criterion  and/or  predictor  measurement,  they  should  Influence 
variation  In  validities  via  variation  In  CR  and  PR,  respectively. 

In  sum,  it  is  our  belief  that  variation  among  the  r^  attributed  to 

the  unmeasured  artifacts  of  CCD  and  CE  Is  at  least  in  part  represented  in  the 

measured  artifacts  of  CR  and  PR.  This  suggests  (to  us  at  least)  that  a  VG 

2 

decision  rule  which  proportions  roughly  22.5*  of  to  the  unmeasured 

artifacts  of  CCD  and  CD  Is  seriously  flawed,  given  that  (a)  a  significant  and 

likely  substantial  portion  of  the  variance  In  validities  attributed  to  CCD 

and  CE  Is  already  represented  In  the  measured  artifacts  of  CR  and  PR,  and  (b) 

2 

CR  and  PR,  plus  RR,  account  for  only  7.5*  of  sp  at  the  critical  value 
of  the  decision  rule.  We  propose,  therefore,  that  a  more  reasonable 
hypothesis  Is  that  variance  among  the  due  to  truly  unmeasured 
portions  of  the  CCD  and  CE  artifacts  Is  unlikely  to  be  greater  than  variance 
among  the  r^  that  Is  caused  by  the  measured  artifacts  CR,  PR,  and 
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2 

RR— that  Is,  7.5%  of  .  So,  If  we  estimate  variance  among  the  r^ 

due  to  truly  unmeasured  sources  In  CCD  and  CE  at  7.5%,  add  to  this  variance 

In  the  r^  due  to  "slight"  differences  In  the  factor  structures  of  tests 
“■  o 

(l.e.,  2.5%),  we  have  approximately  10%  of  s*.  attributable  to 

unmeasured  artifacts. 

Proceeding  on  this  basis  suggests  that  given  a  VG  ratio  *  .75,  we  should 
add  .10  to  the  ratio  to  account  for  unmeasured  artifacts  (l.e.,  attribute  85% 

of  the  observed  variance  to  artifacts).  This  leaves  15%  of  the  variance 

2 

attributable  to  o’  .  To  reduce  this  value  to  zero— that  Is,  to  define  a 

2 

VG  decision  rule  that  more  realistically  Implies  that  d^  *  0  — 

requires  that  we  add  .15  to  .75.  Thus,  it  Is  our  recommendation  that  a  VG 

decision  rule  of  .90  should  replace  the  present  decision  rule  of  .75. 

Naturally,  this  rule  should  be  revised  as  research  accumulates  regarding 

empirical  estimates  of  Independent  variance  due  to  CCD,  CE,  and  factor 

structures  of  tests.  On  the  other  hand,  to  leave  the  VG  decision  rule  at  .75 

2 

is  to  Invite  rejection  of  the  null  hypothesis  that  d^  >  0  when, 

2 

according  to  the  heuristics  above,  d^  could  account  for  approximately 
15%  of  the  observed  variance. 

Summary  and  conclusions.  The  primary  conclusions  based  on  the 
preceding  discussion  are  (a)  VG  procedures  do  not  furnish  Irrefutable 
evidence  of  cross-situational  consistency,  and  to  Imply  that  they  do  Is  to 
commit  the  logical  fallacy  of  affirming  the  consequent;  and  (b)  the  VG 

a  2  2 

decision  rule  of  d^  /s.r  2  *75  should  be  replaced  with 
—  — 

<K /s  >  *90,  given  that  samples  are  not  large  and  that  the 

I 
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measurable  artifacts  are  sampling  error,  range  restriction,  criterion 

rellabllty,  and  predictor  reliability.  Adopting  a  decision  rule  of  .90 

should  reduce  conflicts  between  the  results  of  different  types  of  analyses, 

such  as  between  VG  analysis  and  chi-square  tests  of  the  homogeneity  of 

population  correlations.  Applied  to  the  data  In  Table  1,  for  example,  a 

decision  rule  of  .90  would  fall  to  dlsconflrm  the  null  hypothesis  that 
2  2 

>0,  thus  leaving  d^  >0,  which  was  confirmed  by  the 
chi-square  analysis,  as  the  most  useful  explanation  for  the  observed 
variation  among  the  r^. 

This  example  above  Is  Illustrative  of  the  fact  that  a  decision  rule  of 

.90  will  likely  reduce  the  percentage  of  occasions  on  which  an  Inference  that 

2 

d^  *  0  Is  warranted  from  the  present  46%  of  validity  distributions 
(Schmidt  et  al.,  1982)  to  a  lower,  perhaps  much  lower,  percentage  of  validity 
distributions.  It  follows  that  the  hypothesis  of  situational  specificity  Is 
alive  and  well  (was  It  ever  not?).  However,  the  recommended  change  to  a 
decision  rule  of  .90  may  stimulate  the  cry  that  (a)  bias  In  favor  of  a 
finding  of  cross-situational  consistency  Is  being  replaced  with  bias  In  favor 
of  situational  specificity,  and/or  (b)  one  heuristic  decision  rule  (VG  ratio 
2  .75)  which  lacks  corroborative  evidence  Is  merely  being  replaced  with 
another  heuristic  decision  rule  (VG  ratio  >  .90)  that  Is  equally  lacking  In 
corroborative  evidence.  In  response,  we  largely  agree  with  the  latter  point 
and  again  underscore  the  need  for  research  designed  to  Identify  an 
empirically  defensible  decision  rule.  In  the  Interim,  we  believe  that  a 
decision  rule  of  .90  Is  more  reasonable  than  a  decision  rule  of  .75  for  the 
reasons  stated  in  the  development  of  the  recommended  change  to  a  rule  of  .90. 
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Finally,  we  refer  again  to  the  Osburn  et  al.  (1983)  simulation  study  which 
clearly  supported  the  need  for  a  decision  rule  more  stringent  than  .75. 

False  Assumptions 

It  was  noted  briefly  that  the  statistical  foundation  for  the  VG 
analytic  procedures  is  furnished  by  psychometric  analogy  and  structural 
equations  (cf.  Hunter,  Schmidt,  &  Jackson,  1982;  Schmidt  et  al.,  1980). 

The  fundamental  structural  equation  Is  3  £j  +  -1  *  w*iere 
r^  is  the  observed  score  for  sample  (subject)  1_,  is  the 
population  correlation  (true  score)  for  sample  (subject)  1_,  and  e^  is 
the  sampling  (random  measurement)  error  for  sample  (subject)  jL  Like  the 
psychometric  equation  on  which  it  Is  based,  this  equation  Is 
underidentified.  That  Is,  for  each  sample  we  have  one  piece  of  known 
data  (r^ )  and  two  pieces  of  unknown  data  (^  and  e^).  We  thus 
have  one  equation  in  two  unknowns,  the  result  of  which  is  no  unique 
mathematical  solution  for  either  unknown.  Adding  new  samples  from 
different  populations  does  not  help  because  each  new  sample  contributes  one 
known  and  two  unknowns,  not  to  mention  a  new  population  and  a  new  sampling 
distribution.  (This  discussion  and  that  to  follow  Is  based  on  the  usual 
case  of  one  sample  per  situation  or  population.  If  it  were  possible  to 
obtain  many  Independent,  random  samples  from  each  situation,  then  not 
only  could  each  be  estimated,  but  also  the  total  variance  among  all 
observed  correlations  could  be  decomposed  empirically  Into 
between-sltuatlon  variance  and  wlthln-sltuatlon  variance,  using  classic 
ANOVA  paradigms.  Unfortunately,  the  rarity  of  many  Independent  samples 
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from  each  of  two  or  more  different  situations  requires  that  we  proceed  with 
but  one  of  a  theoretically  Infinite  number  of  samples  from  each  of  K 
populations  and  sampling  distributions.)1 

Similar  to  classic  psychometrics  (cf.  Gulllksen,  1950;  Lord  &  Novlck, 

1968),  the  VG  approach  proceeds  with  the  underidentified  structural 

equation  and  employs  a  set  of  assumptions  that  make  possible  the  estimation 

of  moments  of  the  unobservable  (latent)  true  scores  and  error  scores  In 

terms  of  moments  of  the  observed  .  Given  the  basic  structural 

equation  ^  +  e^,  It  Is  assumed  that  (a)  the  mean  error  Is  zero 

within  each  study  (population,  situation),  (b)  ^  and  e^  are  unrelated 

2  2  2 

across  studies,  and  (c)  <5^  =  d^  +  (Hunter,  Schmidt,  & 

Jackson,  1982).  Furthermore,  Implicit  In  the  use  of  several  VG  estimating 
equations  Is  the  assumption  that  the  errors  are  normally  distributed  and/or 
the  assumption  that  the  wlthin-study  error  variances  are  homogeneous.  Each 
of  these  assumptions  Is  discussed  In  greater  detail  below,  where  It  is 
shown  that  all  of  the  assumptions  above  are  false  If  the  £|  vary  or 
could  vary. 

Nonnormality  of  error  distributions.  A  theoretical  sampling 

distribution  of  observed  correlations  (r^a)  exists  for  each  jjj, 

where  1_  again  references  1,  ...»  K  populations  and  a  refers  to 

1,  ...,  A  observed  correlations  In  the  sampling  distribution  for  each 

(technically,  A  ->«^.  The  variance  among  the  r^a  for  a 
™  o 

particular  Is  designated  .  Given  that  r^a  a  +  e^a 
In  a  given  sampling  distribution  and  that  £,  Is  a  constant  in 
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2  2  2 

that  population,  it  follows  that  firl  *  tfgj  »  where 

Is  the  error  variance  for  the  sampling  distribution  associated  with  £,. 

2  2  — 

(Note  that  tf  and  refer  to  variances  over  K 

populations. )  The  equation  employed  In  VG  analysis  to  estimate  error 

variance  for  a  sampling  distribution  derives  from  the  equation: 

%2  *  -  d-fi/)2/"  (6) 

which  assumes  a  large  sample  (see  below)  drawn  from  a  bivariate  normal 
population  with  correlation  coefficient  £.  (Kendall  &  Stuart,  1969, 

1973). 

The  sample  estimating  equation  based  on  the  single  r.^,  or  r^, 
used  In  VG  analysis  Is: 

4l2  =  sri2  =  (l-r.2)2/^-!)  (7) 

which  Is  presented  and  discussed  by  Ezekiel  and  Fox  (1959)  and  Fisher 
(1954)  (Fisher  also  uses  (n^  -1)  In  the  denominator  of  Equation  6). 

Kendall  and  Stuart  (1973,  p.  304)  contended  that  the  use  of  Equation 
6  (and  by  Implication  Equation  7)  to  estimate  the  variance  of  a  sampling 
distribution  "Is  of  little  value  In  practice  since  the  distribution  of  r 
tends  to  normality  so  slowly  [cf.  Kendall  &  Stuart,  1969,  p.  388]:  It  Is 
unwise  to  use  It  for  [n^]  <  500."  Fisher  (1954)  suggested  that 
Equation  7  should  not  be  used  for  an  n^  <  100.  The  rationale  for  these 
statements  Is  that  when  £|  departs  from  zero  and  n^  Is  not  large 
(e.g.,  <  100  or  500,  depending  on  the  reference),  then  the  distribution  of 
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the  r^a  Is  skewed.  In  particular,  the  distribution  Is  negatively 
skewed  for  positive  £j.  For  a  given  n^,  such  as  70,  the  degree  of 
skew,  as  well  as  kurtosls.  Increases  as  the  (absolute)  value  of  £| 

Increases  (Ezekiel  &  Fox,  1959;  Fisher,  1954;  Kendall  &  Stuart,  1969,  1973; 
Mulrhead,  1982).  In  general,  the  sampling  distribution  tends  toward 
normality,  but  very  slowly,  as  n^  Increases,  although  with  very  large 
£|  the  distribution  remains  nonnormal  even  with  large  n^. 

Focusing  on  positive  £^ ,  the  reason  for  negative  skews  is  simple; 

£j  is  bounded  by  1.00.  The  problem  Is  therefore  most  pronounced  for 

very  large  £|.  Nevertheless,  even  with  moderate  £j  and  n^  < 

100  or  500,  a  ramification  of  negative  skews  In  sampling  distributions  for 

most  If  not  all  of  the  correlations  In  Table  1  Is  that  the  estimate  of 

error  variance  for  each  of  the  K  samples  Is  less  than  it  should  be  (cf. 

Fisher,  1954).  It  follows  that  the  estimate  of  expected  error  variance 

furnished  by  Equation  3  In  Table  2  Is  also  less  than  It  should  be.  We  can 

correct  these  estimates  by  using  the  asymptotic  expansion  furnished  by 

Ghosh  (1966)  for  estimating  the  variance  of  the  for  a  single 

sampling  distribution.  Unfortunately,  this  equation  Is  too  complex  to 

present  here.  In  general,  however,  the  values  furnished  by  the  Ghosh 

(1966)  equation  for  the  correlations  In  Table  1  are  only  slightly  larger 

than  those  furnished  by  Equation  7,  given  n^  =  70.  For  example,  with 
A  o  “ 

Lf  *  *50,  is  .0083  based  on  the  Ghosh  (1966)  equation  and 

.0082  based  on  Equation  7. 

The  difference  between  the  correct  estimates  furnished  by  the  Ghosh 
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(1966)  equation  and  those  furnished  by  Equation  7  are  trivial  (for  these 
data),  and  one  may  argue  that  the  practical  approach  Is  to  employ  Equation 
7  to  estimate  error  variance.  But  a  plea  for  pragmatism  (and  expediency) 
is  confronted  with  the  problem  that  VG  procedures  were  developed  to  explain 
why  observed  validities  vary  over  situations  by  testing  causal  models  for 
observed  validities  and  distributions  of  observed  validities.  Indeed, 
causal  models  and  structural  equations  are  presumably  employed  in  VG 
analysis  in  order  to  "greatly  advance"  the  scientific  status  of  personnel 
research.  But  scientific  explanation  is  not  "greatly  advanced" 
by  relying  on  an  equation  (Equation  7)  that  statisticians  have  shown  to  be 
flawed,  however  trivial  the  flaw,  for  smal-1  samples  and  p^  f  0,  the 
key  constituents  of  VG  analysis. 

Yet  an  important  commodity  in  science  is  time,  and  the  time  and 
difficulty  required  to  use  the  Ghosh  (1966)  equation  versus  Equation  7  are 
compelling  forces  to  proceed  with  the  pragmatic,  indeed  parsimonious,  use 
of  Equation  7  to  estimate  error  variance.  However,  a  call  for  parsimony 
and  pragmatism  is  not  a  defensible  position  in  this  case  because  it  is 
unnecessary.  Specifically,  a  minimal  amount  of  time  spent  in  converting 
the  r^  to  Fisher  z  coefficients  (zs)  would  help  to  resolve  not 
only  the  problem  of  nonnormal  distributions  —  distributions  of  zs 
approach  normality  much  more  rapidly  than  (Pearson)  rs  —  but  also  most 
of  the  statistical  errors  discussed  below.  Thus,  we  do  not  urge  the 
use  of  the  Ghosh  (1966)  equation  or  any  other  equation  based  on 
correlations.  We  will  recommend  the  use  of  zs  In  VG  analysis.  Before 
developing  these  points,  however,  it  is  necessary  to  document  other 
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problems  with  the  use  of  rs  in  VG  analysis. 

Nonzero  expected  values  of  errors.  Hunter,  Schmidt,  and  Jackson 
(1982,  p.  43)  state  that  "Since  the  mean  error  is  zero  within  each  study, 
the  error  variances  across  studies  in  [sic]  the  average  within  study 
variance."  The  first  problem  with  this  statement  is  that  the  mean 
within-study  (within-population)  error  is  not  equal  to  zero  with  skewed 
distributions.  This  point  derives  from  the  well-known  fact  that  an  r. 

~l  Q 

is  a  biased  indicator  of  its  respective  £^  (cf.  Muirhead,  1982).  The 
expected  value  for  e_._  is: 

1  Q 

■  £ (l^)  -  fii 

•  (£i  -  l  £id-£i_2)]/2[n.  -1])  -  s_. 

-  C  -£id-£12)/2(ni  -1)]  (8) 

This  derivation  is  based  on  Muirhead's  (1982)  equation  for  E(r)  and 
involves  deletion  of  a  term  J)(n  )  from  the  £(r)  equation.  Equation 
8  suggests  that  the  mean  error  within  each  study  takes  a  negative  value  for 
positive  £. ,  which  is  expected  for  negatively  skewed  sampling 
distributions.  It  suggests  also  that  if  the  £j  vary,  then  the 
E(eia)  will  also  vary  because  £(e^a)  is  a  function  of  £^. 

This  connotes  that  some  variation  among  the  over  studies  could  be 
due  to  variation  among  the  means  of  the  within-study  errors.  Finally, 
given  that  the  £(e^a)  are  a  function  of  the  £j,  the  possibility 
exists  that  the  e^  and  £^  in  the  equation  r\j  *  £^  +  e^ 


Validity  Generalization 


26 

are  related  (A=l  in  this  equation).  We  cannot  show  this  directly 
with  our  Illustrative  data  because  the  are  unknown  (i.e.,  we  have 
only  reasonable  limits).  We  may,  however,  develop  another  illustration. 

A  hypothetical  distribution  of  14  £..  (true  validities)  is  presented 
in  Table  3.  The  £..  vary  between  .05  and  .70.  Values  of  i(e^a)  are 
given  for  each  £.. ;  these  values  are  based  on  n..  =  70  for  all  samples. 

The  values  of  the  £(eifl)  are  of  small  magnitude,  which  suggests  minimal 
bias  in  variance  estimates  because  of  failure  to  consider  variation  in  the 
expected  errors.  More  Important  is  the  curvilinear  relation  between 
the  p.  and  the  E(e.  ).  Technically,  E(e,a )  assumes  a  maximum 
value  at  £^  =.58,  approximately.  As  £^  Increases  from  .05  to  .58, 
the  values  of  the  £(e.ia)  become  Increasingly  negative;  as  £^ 

Increases  beyond  .58,  the  values  of  the  E(e,._)  become  decreasingly 
negative.  Inasmuch  as  the  true  validity  for  a  single  test  would  not  be 
expected  to  exceed  .58  very  often,  we  might  assume  that  the  £^  and 
JL(®ia)  are  generally  negatively  related.  We  pursue  this  point  below. 


Insert  Table  3  about  here 


Nonhomoqeneous  error  variances.  The  equation  for  the  variance  of 

p 

errors  for  a  sampling  distribution  (Equation  6)  shows  that 

— 

varies  as  function  of  £^.  For  positive  £^,  i£  inversely 

related  to  the  magnitude  of  p^.  This  results  in  violation  of  the  usual 

assumption  in  psychometrics  that  error  variances  associated  with  different 
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true  scores  are  homogeneous.  A  more  Important  consideration,  however,  is 

2 

the  inverse  relation  between  the  £^  and  the  d^  and  the 
implications  of  this  relation  for  independence  between  the  the  £|  and 
the  e^ . 

Nonindependence  between  true  scores  and  error  scores.  Two  critical 

a*  p  2 

equations  in  VG  analysis,  namely  the  VG  ratio  /sr  and  "est 
Op2  ■  jT.2  -  jfe2"  (Hunter,  Schmidt  &  Jackson,  1982,  p.  44),  are 
contingent  on  the  assumption  that  <fr  =  ,  which 

in  turn  is  based  on  the  assumptions  that  the  and  e^  are 
independent  (over  K  populations  or  studies)  and  that  E(r^)  =  £^. 

It  was  noted  that  generally  ^(r^ )  f  £.  and  thus  we  now  turn  to 

the  assumption  that  £.  and  e^  are  independent.  Lord  (1960,  p.  94) 

referred  to  the  assumption  of  independence  between  true  scores  and  error 

scores  as  the  "independence  hypothesis."  Lord  (1960,  p.  91)  also  noted 

that  the  key  concern  is  the  "hypothetical  bivariate  scatterplot  between 

true  scores  and  errors  of  measurement,"  which  "cannot  be  constructed 

empirically."  A  similar  rationale  applies  here;  we  wish  to  know  the 

relation  between  the  £j  and  the  e^ .  We  cannot  construct  a 

bivariate  scatterplot  because  we  do  not  know  the  values  of  either  the 

£j  or  the  e^  inasmuch  as  the  equation  r..  =  £j  +  e^  is 

underidentified.  We  may,  however,  address  lack  of  independence  by  other 

means.  For  example,  a  hypothetical  set  of  £j  and  E(£1a)  values 

in  Table  3  implied  nonindependence  between  the  £^  and  e^.  This 

Issue  is  now  treated  using  procedures  furnished  by  Lord  (1960)  and  Lord  and 

Novick  (1968). 
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These  authors  recommended  the  use  of  third-order  moments  to 
test  the  Independence  hypothesis.  The  test  of  concern  Is  based  on  the 
covariance  between  the  £j  (In  deviation  form)  and  conditional  error 
variances  (l.e.,  the  d^.  ),  or  Cov(£^  d^  )  (Lord  &  Novlck, 

1968,  p.  229).  If  the  £|  are  Independent  of  the  e^,  then  Cov 

2  ^ 

(Pj  ^e-j  )  *  0-  But  this  Is  obviously  not  the  case  because,  as 

”  2 

discussed,  not  only  are  the  4.1  nonhomogeneous,  but  also  the 
2 

d^  vary  Inversely  as  a  function  of  the  £j.  Thus,  for  positive 
2  ~~ 

£r  COV^  V  >  assumes  a  nonzero,  negative  value,  from  which 

we  can  conclude  that  the  £^  and  e^  are  nonindependent.  It  follows 

2  2  2~ 

that  the  equation  ^  *  d^  +  d^  Is  In  error  and  that  the 
statistical  foundation  for  such  things  as  the  VG  ratio  is  also  In  error. 

Satisfying  the  assumptions.  The  attempt  to  use  the  classic 
psychometric  model  to  build  a  statistical  foundation  for  VG  analysis  results 

In  violation  of  many  of  the  classic  model's  assumptions.  Now,  If  £j  =  £ 

2  ” 

(l.e.,  d^  =0),  as  the  VG  proponents  assume,  then  many  of  the  problems 

discussed  In  regard  to  false  assumptions  dissolve.  For  example,  the  error 

variances  are  homogeneous  because  all  £|  are  the  same  (for  constant 

n^).  However,  assuming  £  >  0,  the  sampling  distributions  are  still 

likely  to  be  negatively  skewed,  the  expected  error  variances  are  not  zero, 

and  the  error  variance  Is  underestimated  using  the  VG  equation.  The  primary 

result  of  these  problems  Is  a  small  bias  In  favor  of  a  finding  of 

situational  specificity.  Yet,  the  VG  procedures  remain  troublesome  because 

they  are  represented  as  a  test  of  the  hypothesis  that  <£  >0. 
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But  the  moment  we  necessarily  entertain  the  possibility  that  the  vary, 
we  must  also  allow  for  the  possibility  of  heterogeneous  error  variances  and 
nonindependence  between  the  £^  and  e^.  Thus,  under  the  stated  basis 
for  the  null  hypothesis,  and  making  the  not  unrealistic  assumption  that  the 
£j  are  never  precisely  equal,  it  follows  directly  that  the  VG  procedures 
furnish  biased  estimates  because  statistical  assumptions  are  not  satisfied. 

How  serious  is  the  bias?  An  example  presented  shortly  Indicates  a  small 
overall  bias  in  favor  of  a  finding  of  cross-situational  consistency  for  the 
illustrative  data  In  Table  1.  Other  investigators  have  addressed  at  least 
some  of  the  assumptions  for  VG  analysis  and  have  concluded  that  (a)  sampling 
distributions  of  observed  correlations  are  "approximately  normal"  (Pearlman 
et  al.,  1980  p.  381)  or  "close  to  normal"  (Schmidt  et  al.,  1981,  p.  174) 
except  for  very  large  values  of  £,  which  Implies  little  or  no  bias  due  to 
skewed  sampling  distributions  for  selection  studies  at  least;  and  (b) 
nonindependence  between  the  £.  and  e^  results  in  minor 

“  p  ~ 

underestimation  of  the  value  of  ^  (Burke,  1984;  Linn  &  Dunbar,  1982). 
Callender,  Osburn,  Greener,  and  Ashworth  (1982)  used  Monte  Carlo  techniques 

to  show  that  a  skewed  distribution  of  hypothetical  £^  had  no  influence  on 

2  ~ 
estimates  of  d^  .  The  general  conclusion,  therefore,  appears  to  be 

that  whatever  bias  exists  In  VG  analysis  Is  small  and,  pragmatically,  has 

little  Influence  on  results.  Consequently,  one  may  proceed  with  VG  analysis 

without  grave  concern  for  bias  Introduced  by  violations  of  assumptions. 

A  reasonable  opinion,  but  not  one  that  we  share.  To  reiterate  briefly, 
our  view  is  that  if  (a)  an  avowed  objective  of  using  VG  procedures  is  to 
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advance  the  scientific  status  of  personnel  research  (cf.  Schmidt  &  Hunter, 

1977),  then  (b)  the  VG  procedures  should  stand  up  to  scientific  scrutiny. 

While  science  Is  not  blind  to  the  need  for  pragmatics  and  occasional 
expediency,  the  pattern  of  formal  statistical  error  after  formal  statistical 
error  does  little  to  advance  the  scientific  merits  of  the  VG  enterprise. 

Even  If  the  overall  degree  of  bias  is  trivial.  It  would  hardly  do  to  attempt 
to  promote  the  VG  procedures  as  a  scientific  advancement  when  pragmatics  are 
the  justifications  for  violations  of  almost  every  statistical  assumption  of 
the  model.  Host  Importantly,  It  Is  unnecessary  to  have  to  rely  to  this 
degree  on  pragmatics  because  a  simple  solution  exists  that  reduces  the  bias 
and  Increases  the  scientific  precision  of  the  VG  procedure. 

As  noted  briefly,  the  simple  solution  Is  to  transform  the  observed 

validities  (r^)  Into  Fisher  z  coefficients  and  to  base  the  VG  analysis 

on  these  coefficients.  For  sample  sizes  greater  than  50,  the  sampling 

distribution  of  zs  Is  approximately  normal.  Irrespective  of  the  value  of 

£.j  (Kendall  &  Stuart,  1969,  who  also  present  estimation  equations  for 

2 

£.|  <_  50 ) .  Furthermore,  V  based  on  zs  Is  essentially 

~  2 
Independent  of  the  value  of  because  all  0^  have  an  estimated 

value  of  l/(n^-3)  (for  constant  n^;  variable  n^  Is  addressed  by 

weighting  In  VG  analysis).  A  very  slight  bias  may  persist  If  E(e.ja) 

based  on  z  coefficients  Is  not  zero,  but  this  Is  an  approximation  that  we 

can  live  with  (see  Hotelling,  1953  and  Kendall  &  Stuart,  1969  for  further 

discussions  of  this  Issue). 

In  any  event,  the  use  of  z  coefficients  In  place  of  correlation 
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coefficients  places  VG  analysis  on  a  sounder  statistical  footing  even  though 
It  does  not  ameliorate  all  of  the  statistical  problems.  Interestingly* 

Schmidt  and  Hunter  (1977)  originally  used  zs  In  VG  analysis  to  ensure 
against  covariation  between  the  and  e^,  but  switched  to  observed 
correlations  under  the  assumption  that  their  formula  for  sampling  error  was 
"very  accurate"  (Schmidt  et  al.,  1980,  p.  660).  Later,  the  reason  for  the 
switch  from  zs  to  rs  was  given  as  "the  effect  of  Fisher's  z 
transformation  Is  to  assign  extra  weight  to  large  observed  validity 
coefficients"  (Schmidt  et  al.,  1982,  p.  839).  We  Interpret  the  term  "assign 
extra  weight"  to  mean  that  the  difference  between  the  value  of  z  and  the 
value  of  r  Increases  In  absolute  value  as  the  value  of  r  Increases. 

This,  of  course.  Is  the  price  one  pays  to  achieve  a  sampling  distribution  of 
zs  that  approaches  normality  more  quickly  than  a  sampling  distribution  of 
rs.  It  also  suggests  that  the  variance  among  the  zs  will  be  greater  than 
the  variance  among  the  rs  and  that  the  VG  ratio  will  tend  to  be  lower  for 
zs  than  for  rs.  These  points  are  Illustrated  by  a  reanalysis  of  the  data 

in  Table  1  using  Fisher  z  transformations.  The  VG  ratio  based  on  zs  Is 

A  ? 

.0149/. 021  *  .71,  where  <r^  =  1/67.  The  mean  Fisher  z  for  the 
transformed  r^  In  Table  1  is  .56,  and  the  variance  of  the  zs  Is  .021. 

Clearly,  .71  differs  little  from  the  VG  ratio  of  .75  based  on  rs  and 
Equation  4  (Table  2),  although  the  difference  does  Indicate  a  slight  bias  In 
VG  procedures  based  on  rs  In  favor  of  a  finding  of  cross-situational 
consistency. 

It  Is  Important  to  note  that  this  slight  bias  overestimates 
the  bias  that  would  likely  be  obtained  with  real  selection  data 
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because  the  Illustrative  data  in  Table  1  involve  only  sampling  error  and 
variation  about  a  mean  correlation  (l.e.,  .50)  equal  to  what  Is  considered  by 
VG  proponents  to  be  the  "true  validity"  of  many  tests  (cf.  Pearlman  et  al., 
1980).  With  real  data  the  VG  approach  works  with  a  distribution  of 
correlations  whose  values  have  also  been  attenuated  and/or  reduced  by 
criterion  (and  predictor)  measurement  error  and  range  restriction  in 
estimating  the  predicted  variance  among  the  observed  correlations  that  Is 
due  to  measurable  statistical  artifacts  (cf.  Pearlman  et  al.,  1980;  Schmidt 
et  al.,  1980).  The  corollaries  to  this  point  are  that  (a)  tn«  absolute 
magnitudes  of  the  r^  will  be  lower  than  those  In  Table  1,  from  which  it 
follows  that  (b)  the  difference  between  statistical  values  based  on  r  and 
z,  such  as  the  VG  ratio,  will  be  reduced  because  the  differences  between 
values  of  r  and  values  of  z  decrease  as  the  absolute  value  of  r 
decreases.  But  then  VG  techniques  are  not  limited  to  selection  research  and 
may  be  employed  for  distributions  In  which  the  rs  are  of  greater  magnitude 
than  typically  found  In  selection  studies  or  the  Illustration  used  here. 
Indeed,  VG  analysis  based  on  rs  may  be  Inappropriate  as  a  general 
method  for  Its  flaws  become  Increasingly  evident  as  the  correlations 
Increase  In  magnitude  and  assumptions  become  more  tenuous.  Of  course,  a 
simple  and  statistically  more  precise  alternative  exists,  namely  to  use  £ 
coefficients  In  analyses. 

Recommendations  and  Conclusions 

Three  major  recommendations  have  been  proposed.  First,  Inferences  based 
on  the  results  of  VG  research  should  be  less  dramatic.  Empirical  support  for 
a  cross-situational  consistency  model  Implies  only  that  this  model  furnishes 
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a  useful  basis  for  explaining  the  distribution  of  observed  validities.  This 
model  is  not  unique,  irrefutable,  or  proven.  Second,  the  decision  rule  for 
the  VG  ratio  should  be  .90  rather  than  .75.  This  recommendation  indicates 
that  90%  of  the  variance  in  observed  validities  should  be  attributed  to 
sampling  error  and  other  measurable  artifacts  before  one  infers  that 
d^  =  0.  The  remaining  10%  of  the  variance  is  assumed  to  be  due  to 
unmeasured  artifacts.  This  recommendation  is  subject  to  Immediate  change  as 
soon  as  research  is  obtained  pertaining  to  the  unique  influences  of  criterion 
problems,  clerical  errors,  and  predictor  factor  structures  on  variation  among 
validities.  Third,  VG  analyses  should  employ  Fisher  z  coefficients  rather 
than  (Pearson)  correlation  coefficients.  The  objective  here  Is  to  place  VG 
analysis  on  a  sounder  statistical  footing. 

A  likely  result  of  the  second  and  third  recommendations,  especially  the 
proposed  change  in  the  VG  decision  rule  (which  applies  to  the  use  of  Fisher 
z  coefficients  in  VG  analysis),  is  that  fewer  VG  analyses  will  conclude 
that  cross-situational  consistency  is  a  useful  model  for  explaining  variation 
in  validities.  This  conclusion  has  the  somewhat  unfortunate  implication  that 
all  validities  must  therefore  vary.  There  are  other  views.  Heretofore  we 
have  focused  on  extremes  for  the  purpose  of  contrast.  Now  let  us  ask  whether 
it  is  realistic  to  assume  that  £  Is  different  for  each  situation,  or  at 
least  different  enough  to  warrant  a  separate  analysis  for  each  situation? 

Probably  not.  But  it  is.  In  our  opinion,  as  realistic  as  assuming  that  £  Is 
a  constant  for  every  situation  or,  at  least,  that  the  £'s  do  not  vary 
sufficiently  to  warrant  separate  analyses  for  at  least  some  situations. 

Fortunately,  situational  specificity  and  cross-situational  consistency  as 
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described  In  this  paper  are  but  two  of  many  possible  views.  Indeed,  the  most 
useful  models  for  explaining  variation  In  validities  probably  lie  In  some 
middle  ground  between  these  two  extremes.  This  Is  not  the  place  to  attempt 
to  review  a  voluminous  literature  on  theoretical  models.  We  suggest  only 
that  attempts  to  assess  situational  specificity  and  cross-situational 
consistency  will  be  enhanced  by  Including  situational  variables  In  analyses. 
Measures  representing  membership  In  gross  categories  such  as  job  families 
(cf.  Pearlman,  1980)  are  helpful  but  lack  the  explanatory  power  furnished  by 
measurement  and  explicit  analysis  of  specific  aspects  of  situations  (e.g., 
stress,  leadership)  that  presumably  Influence  correlations  between  person 
variables  and  job  performance  (cf.  James,  Demaree,  &  Hater,  1980).  An  Ideal 
strategy  would  be  to  attempt  to  develop  structural  (causal,  explanatory) 
models  of  job  performance  (and  attitudes)  that  Involve  both  person  variables 
and  situational  variables. 

In  closing,  although  we  have  been  critical  of  VG  procedures,  we  do 
believe  that  the  VG  approach  Is  creative  and  has  the  potential  to  make  a 
contribution  to  research.  Our  key  concern  has  been  the  overdramatlc 
Interpretations  of  the  results  of  validity  generalization  analyses  In  favor 
of  cross-situational  consistency.  These  concerns  apply  also  to  validity 
generalization  to  the  extent  that  "estimated  true  validities”  and 
"credibility  values"  are  subject  to  alternative  models  involving  a 
potentially  greater  degree  of  situational  specificity  than  Indicated  by  VG 
assumptions  and  decisions  rules.  On  the  other  hand,  the  Issue  of 
differential  validity  In  the  context  of  validity  generalization  (cf.  Hunter  8 
Hunter,  1984)  Is  outside  the  bounds  of  this  discussion.  Treatment  of 
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systematic  variation  within  situations  due  to  such  things  as  ethnic,  race, 
and  sex  distinctions  requires  additional  thought  and  statistical  modeling. 
Finally,  as  Indicated  by  the  preceedlng  discussion,  no  attempt  was  made  to 
exhuast  all  possible  concerns  with  VG  procedures.  The  Issues  addressed  here 
were  selected  because  they  were  considered  to  be  among  the  more  salient 
Issues  at  this  time,  especially  In  the  context  of  testing  the  goodness  of  fit 
of  causal  models  for  validities  and  validity  distributions. 
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*A  recent  study  by  Schmidt  and  Hunter  (1984)  applied  VG  procedures  to 
validity  data  obtained  from  four  different  cohorts  of  stenographers  (K  =  4) 
from  the  same  organization.  The  objective  of  the  study  was  to  demonstrate 
that  "If  the  statistical  artifacts  operating  are  the  same,  observed 
validities  will  vary  as  much  within  the  same  setting  as  they  do  across 
settings  .  .  .  merely  as  a  result  of  artifacts  such  as  sampling  error" 
(Schmidt  &  Hunter,  1984,  p.  320).  Unfortunately,  a  sample  of  only  four 
correlations  was  available  for  each  of  five  different  tests.  The  instability 
of  results  based  on  such  a  small  sample  of  rs  Is  indicated  by  the  values  of 
the  VG  ratio  (l.e.,  0_2/s  Z)  for  the  five  tests,  which  were 

I 

4.0,  1.31,  .709,  .422,  and  .81.  (Schmidt  and  Hunter  [1984]  reported  results 
In  terms  of  a  "new"  ratio,  namely  £r/$e*  which  took  values  of 
.50,  .88,  1.19,  1.54,  and  1.11  for  the  five  tests,  respectively).  One  should 
probably  question  the  general Izabillty  of  data  which,  based  on  the  "old"  VG 
ratio,  suggest  for  one  test  that  400%  of  the  observed  variance  for 
correlations  is  accounted  for  by  sampling  error.  Thus,  we  will  not  address 
this  study  again  In  this  paper. 
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Table  3 

Population  Correlations  and  Expected  Values  of  Sampling  Errors 


Population  Correlation  (j>^) 

Expected  Value  of  Error  [E(£^ ] 

.70 

-.00259 

.65 

-.00272 

.60 

-.00278 

.55 

-.00278 

.50 

-.00272 

.45 

-.00260 

.40 

-.00243 

.35 

-.00223 

.30 

-.00198 

.25 

-.00170 

.20 

-.00139 

.15 

-.00106 

.10 

-.00072 

.05 

-.00036 

Note,  n^ 


70  for  all  samples. 
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